How can I cluster short messages [Tweets] based on topic ? [Topic Based Clustering]

Posted by Jagira on Stack Overflow See other posts from Stack Overflow or by Jagira
Published on 2010-05-28T16:58:06Z Indexed on 2010/05/28 17:02 UTC
Read the original article Hit count: 251

Filed under:
|

Hello,

I am planning an application which will make clusters of short messages/tweets based on topics. The number of topics will be limited like Sports [ NBA, NFL, Cricket, Soccer ], Entertainment [ movies, music ] and so on...

I can think of two approaches to this

  • Ask for users to tag questions like Stackoverflow does. Users can select tags from a predefined list of tags. Then on server side I will cluster them on based of tags. Pros:- Simple design. Less complexity in code. Cons:- Choices for users will be restricted. Clusters will not be dynamic. If a new event occurs, the predefined tags will miss it.
  • Take the message, delete the stopwords [ predefined in a dictionary ] and apply some clustering algorithm to make a cluster and depending on its popularity, display the cluster. The cluster will be maintained according to its sustained popularity. New messages will be skimmed and assigned to corresponding clusters. Pros:- Dynamic clustering based on the popularity of the event/accident. Cons:- Increased complexity. More server resources required.

I would like to know whether there are any other approaches to this problem. Or are there any ways of improving the above mentioned methods?

Also suggest some good clustering algorithms.I think "K-Nearest Clustering" algorithm is apt for this situation.

© Stack Overflow or respective owner

Related posts about News

Related posts about dataclustering